Dynomotion

Group: DynoMotion Message: 9380 From: k_dm927 Date: 4/8/2014
Subject: Linking in code compiled with cl6x or gcc
Has anyone successfully used TI's compiler (cl6x, TMS320C6x C/C++ Compiler) or gcc to build user programs for the KFLOP? I want to see how much of a speedup I can get from an optimizing compiler. I think I'm close to building a usable binary with cl6x, but I don't know how to get the cl6x-generated code into the right address space (0x80050000). It seems to put my functions and data near 0x0, and I can't find any way to relocate that code using either the compiler or the linker.

This is what I'm trying:

main.c: initialization code, main loop
slow.c: part of the program that takes most of the processing time

./c6000_7.4.7/bin/cl6x -mv 6722 slow.c -O2  --symdebug:none  -o slow.obj
./c6000_7.4.7/bin/lnk6x -o slow.out slow.obj -e _my_slow_func ./c6000_7.4.7/lib/rts6200.lib --hide=\* --unhide=_my_slow_func --unhide=_a_variable_in_slowc
wine ~/.wine/drive_c/KMotion430/KMotion/Release/TCC67.exe  -text 80050000 -g -nostdinc -o fw.out DSPKFLOP.out  main.c slow.out

Then `nm6x fw.out` gives me:

00000420 D _my_slow_func
000037e0 D _a_variable_in_slowc

and all of the other symbols are at >0x10000000 (mostly >0x80000000), except for '00000000 D ^X^U'. However, I suspect several other functions are also near 0x0: _sinf, _sqrtf, etc., which are taken from rts6200.lib. I would use DSPKFLOP.out's math functions, but I don't see how to do so. If I link slow.obj against DSPKFLOP.out while building slow.out, then tcc sees redefined symbols when I try to link slow.out, main.c and DSPKFLOP.out all together. And it looks like tcc doesn't link relocatable files, only executables, which I think need to have all of their symbols resolved.
Group: DynoMotion Message: 9381 From: Tom Kerekes Date: 4/9/2014
Subject: Re: Linking in code compiled with cl6x or gcc
Hi,

I don't know of anyone using the TI Optimizing Compiler.  I don't think a gcc compiler exists for the TI C6722 processor.  This is why we created the TCC67 compiler from TCC.

Code executes much faster using the (slow/expensive) TI Optimizing compiler.  But most User programs just make function calls to KFLOP's internal optimized routines so it doesn't make much difference.

Besides the Optimizing compiler what can speed up code a lot is placing it in Internal DSP RAM.  There is only 128KBytes of Internal RAM but it is really fast (single cycle and 256 bits wide).

I think you will need to make a .cmd file for the TI  Compile/Link to place the code in the right place.

I made an example.  See the attached Zip file (you will need to re-name it).  UnZip it as a directory TI_Compiler under the C Programs folder.  There is a Batch file and some linker files to compile an example to be run as Thread#2.

Unfortunately we don't have an easy way to import the symbols into the TI compiler from the KFLOP Binary the way we have for TCC67.  But if you only need to access a few functions in KFLOP you can hard code them in the linker file.

There is a procedure to follow included.  It is attached separately as well.

Check out the Video !

Regards
TK



 




Group: DynoMotion Message: 9396 From: k_dm927 Date: 4/10/2014
Subject: Re: Linking in code compiled with cl6x or gcc
Thanks! My program effectively got a 33x* speedup: 0.037 vs 1.25 ms.

I threw everything into a Makefile script. It compiles your code, locates necessary symbols, and links the program. I've put it at https://gist.github.com/ktossell/10439006 . Currently it's only set up for building programs that run from the internal RAM.

Does this linker script look all right?

/* autogenerated linker script for myprog, thread 3
 * source files: main.c leds.c
 */
-c
-heap 15700000
-stack 0x800
_ClearBit = 0x10010f28;
_SetBit = 0x10010d20;
_printf = 0x80023140;
_sinf = 0x1000ecd4;
MEMORY {
IRAM: o = 0x1001c000, l = 0x00004000
THREAD_MEM: o = 0x80070000, l = 0x00010000
SDRAM: o = 0x80100000, l = 0x00f00000
}
SECTIONS {
.placeholder: palign(8), fill = 0xaaaaaaaa {. += 4;} > THREAD_MEM
.text > IRAM
.far > IRAM
.const > IRAM
}

I didn't know if the other sections were necessary for user programs. This seems to cover everything my program needs.

[*] where 3/4 of the 1.25ms was the program waiting for the thread to be rescheduled; actual running time was probably more like 0.4 ms, so a ~10x speedup.

---In DynoMotion@yahoogroups.com, <tk@...> wrote :

Hi,

I don't know of anyone using the TI Optimizing Compiler.  I don't think a gcc compiler exists for the TI C6722 processor.  This is why we created the TCC67 compiler from TCC.

Code executes much faster using the (slow/expensive) TI Optimizing compiler.  But most User programs just make function calls to KFLOP's internal optimized routines so it doesn't make much difference.

Besides the Optimizing compiler what can speed up code a lot is placing it in Internal DSP RAM.  There is only 128KBytes of Internal RAM but it is really fast (single cycle and 256 bits wide).

I think you will need to make a .cmd file for the TI  Compile/Link to place the code in the right place.

I made an example.  See the attached Zip file (you will need to re-name it).  UnZip it as a directory TI_Compiler under the C Programs folder.  There is a Batch file and some linker files to compile an example to be run as Thread#2.

Unfortunately we don't have an easy way to import the symbols into the TI compiler from the KFLOP Binary the way we have for TCC67.  But if you only need to access a few functions in KFLOP you can hard code them in the linker file.

There is a procedure to follow included.  It is attached separately as well.

Check out the Video !

Regards
TK



 




Group: DynoMotion Message: 9403 From: Tom Kerekes Date: 4/12/2014
Subject: Re: Linking in code compiled with cl6x or gcc
Nice.  Wow you are the Master of makefiles!

It looks correct to me.  I think those are all the sections needed for simple C code.

I don't see why you say only a 10X speedup.  Both methods should only get about 1/4 of the CPU.

BTW it now seems the Code Generation Tools (Compiler without CCS) may be a free download.
https://www-a.ti.com/downloads/sds_support/TICodegenerationTools/download.htm

Thanks,
TK

Group: DynoMotion Message: 9404 From: k_dm927 Date: 4/12/2014
Subject: Re: Linking in code compiled with cl6x or gcc
Yeah, I'm not sure whether I'm thinking of the speedup in a useful way. My thoughts were:

My code has a 1KHz loop. It calls WaitUntil(x), incrementing x by 0.001 each time, so the thread should wake up as soon as possible after each 1ms period ends. This period might end while the thread is scheduled, or it might end somewhere in the remaining 72% of the time.

Built with TCC, the code takes 1250 us of real time to get from "gather inputs" to "command axes". In 1250 us, the KFLOP will have gone through its servo-system-servo-user cycle about 7 times, during which the user thread will have run for about 7 x 50us = 350us. But when the optimized code is running, there's a good chance that the 37us it needs will fit inside a single 50us thread scheduling period (because the timer probably expired before the thread was called, and the thread can wake up immediately once it's scheduled).

The ratio in real time is 1250/37 = 33.8, but the ratio in time spent actively calculating my outputs is 350/37 = 9.5.
Group: DynoMotion Message: 9406 From: Tom Kerekes Date: 4/12/2014
Subject: Re: Linking in code compiled with cl6x or gcc
Oops.  I think you are exactly correct.  I misread your 0.037 vs 1.25 ms. backwards as 0.037 seconds vs 1.25 milliseconds.

BTW it looks like your makefile specifies -O2 for optimization.  I always use -O3.  Have you tried -O3?  I suppose it wouldn't actually matter to you since your calculation completes in one time slice anyway. 

Regards
TK